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Abstract —We investigate the information-theoretic throughout 
achievable on a fading communication link when the receiver is 
eqnipped with one-hit analog-to-digital converters (ADCs). The 
analysis is condncted for the setting where neither the transmitter 
nor the receiver have a priori information on the realization of 
the fading channels. This means that channel-state information 
needs to he acqnired at the receiver on the basis of the one-bit 
quantized channel ontpnts. We show that least-squares (LS) chan¬ 
nel estimation combined with joint pilot and data processing is 
capacity achieving in the single-user, single-receive-antenna case. 

We also Investigate the achievable uplink thronghpnt in a mas¬ 
sive multiple-input multiple-output system where each element of 
the antenna array at the receiver base-station feeds a one-bit ADC. 
We show that LS channel estimation and maximnm-ratlo combin¬ 
ing are sufficient to support both multiuser operation and the use 
of high-order constellations. This holds in spite of the severe non¬ 
linearity introdnced by the one-bit ADCs. 

I. Introduction 

Digital signal processing (DSP) is an integral part of all mod¬ 
ern communication systems. In order to process data digitally, 
the analog baseband signal has to be mapped to the digital 
domain. This requires conversion both in time (sampling) and 
amplitude (quantization). The circuit that performs this last 
operation, known as analog-to-digital converter (ADC), is a 
necessary component in every system that includes DSP. An 
ADC with frequency /g and resolution of n bits maps the 
continuous-amplitude samples into a set of 2" quantization 
levels, by operating /s2" conversion steps per second. A crucial 
problem with modern ADCs is that the power dissipated per 
conversion step (a.k.a. Walden’s figure of merit 0>@) increases 
dramatically for sampling rates higher than about 100 MHz Q. 
This implies that, for wideband communication systems, the 
resolution of the ADCs must be kept low to maintain a power 
budget that is within acceptable levels. 

The one-bit resolution case, where the in-phase and the quadra¬ 
ture components of the continuous-valued received samples are 
quantized separately using one-bit ADCs (zero-threshold com¬ 
parators), is particularly attractive, because of the resulting low 
hardware complexity. Indeed, in such a one-bit ADC architecture, 
there is no need for an automatic gain controller. Communication 
systems employing one-bit ADCs have been previously analyzed 
in the context of low-power ultra-wideband systems 0-0, and. 
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more recently, in the context of millimeter-wave communication 
systems 0^ 0^ and massive (or large-scale) multiple-input 
multiple-output (MIMO) systems 0. In ultra-wideband and 
in millimeter-wave systems, the motivation for using one-bit 
ADCs is the large bandwidth of the transmitted signal. In massive 
MIMO systems, an additional reason is the massification of the 
number of radio-frequency chains at the base-station (BS), which 
makes the use of low-cost solutions—such as one-bit ADCs— 
attractive @- 

Previous Results: A receiver employing one-bit ADCs 
needs to cope with the severe nonlinearity introduced by such 
quantizers. In their presence, the signaling schemes and the 
receiver algorithms employed for the case of high-resolution 
quantizers become suboptimal. The impact of the one-bit ADC 
nonlinearity on the performance of communication systems has 
been analyzed to some extent in the literature. In |10| , it is 
proven that 2-PAM is capacity achieving over a real-valued 
nonfading single-input single-output (SISO) Gaussian channel. 
For complex-valued Gaussian channels, QPSK turns out to be 
optimal. For the general MIMO case, QPSK is not capacity 
achieving, and the capacity-achieving distribution is unknown. 

These results hold under the assumption that the one-bit ADC 
is a zero-threshold comparator. It turns out that in the low- 
SNR regime, a zero-threshold comparator is not optimal |11|. 
The optimal strategy involves the use of flash signaling |12| 
Def. 2] and requires an optimization over the threshold value. 
Unfortunately, the power gain obtainable using this optimal 
strategy manifests itself only at extremely low values of spectral 
efficiency. In the remainder of the paper, we therefore exclusively 
focus on the zero-threshold architecture. 

Moving to Rayleigh-fading channels, QPSK is capacity 
achieving (again for the SISO case) under the assumption that the 
receiver has somehow access to perfect channel state information 
(CSI) The assumption that perfect CSI is available is, 
however, not realistic in the one-bit quantized case, since the 
nonlinear distortion caused by the one-bit quantizers makes it 
challenging to estimate the fading process perfectly. For the 
more practically relevant case when the channel is not known a 
priori to the receiver, but must be learnt (for example, via pilot 
symbols), QPSK is optimal when the SNR exceeds a certain 
threshold that depends on the coherence time of the fading 
process 1141. For SNR values that are below this threshold, on-off 
QPSK is capacity achieving IE- 

In0, crude high-SNR bounds are obtained for the capacity 



of one-bit-quantized MIMO fading channels, under the ideal 
assumption that perfect CSI is available to both the transmitter 
and the receiver. Risi et al. 0 recently provided a lower bound 
on the throughput achievable on massive MIMO uplink channels, 
when the BS employs one-bit ADCs. The bound suggests that, 
in some scenarios, massive MIMO may be robust against the 
coarse output quantization resulting from the use of one-bit 
ADCs. However, as the lower bound obtained in 10 is based on 
a suboptimal input distribution, i.e., QPSK, and a suboptimal 
detection algorithm, i.e., least-squares (LS) channel estimation 
followed by maximal-ratio combining (MRC) or zero-forcing, 
its tightness is unclear. 

All the results reviewed so far hold under the assumption 
of Nyquist-rate sampling at the receiver. It is worth pointing 
out that Nyquist-rate sampling is not necessarily optimal in the 
presence of quantization at the receiver p5) , |jT0. Indeed, higher 
information rates can be achieved by oversampling the received 
signal. For example, for the complex AWGN case, high-order 
constellations such as 16-QAM can be supported in the SISO 
case if one allows for oversampling at the receiver di- 

Contributions: Focusing on Nyquist-rate sampling, and on 
the scenario where neither the transmitter, nor the receiver have 
a priori CSI, we investigate the rates achievable over Rayleigh 
block-fading MIMO channels when the receiver is equipped with 
one-bit ADCs. Our contribution is twofold; 


For the SISO case, we prove that LS channel estimation per¬ 
formed jointly on pilot and data symbols is capacity achieving. 
In the infinite precision (no quantization) case, the benefit 
of joint pilot-data (JPD) processing has been illustrated, e.g., 
in m -p0|, where it is shown that joint processing yields 
a smaller gap to capacity compared to separate pilot/data 
processing. Our result shows that, in the one-bit ADC case, 
the gap to capacity is actually zero. Moreover, LS estimation, 
although inferior to the optimal maximum a posteriori proba¬ 
bility estimator (see lig, @), suffices to achieve capacity 
when combined with JPD processing. 

We also consider the uplink of a massive MIMO system where 
single-antenna users communicate with a BS equipped with 
a large antenna array whose elements feed one-bit ADCs. 
Generalizing the analysis presented in |[0, we show that 
MRC combined with LS channel estimation at the BS is 
sufficient to support both multi-user operations and the use of 
high-order constellation such as 16-QAM. Furthermore, the 
rates achievable with 16-QAM turn out to exceed the ones 
reported in Q for QPSK, for SNR values as low as —15 dB, 
and for antenna arrays of 100 elements or more. Our result 
suggests that temporal oversampling, as proposed in ]17) , 
can be replaced by spatial oversampling through the use of a 
massive antenna array at the BS. 


11. System Model 

We consider a single-cell uplink system as depicted Fig. 
where K single-antenna users are served by a BS that is equipped 
with an array of N ^ K antennas. We model the subchannels 
between each transmit-receive antenna pair as a Rayleigh block¬ 
fading channel (see e.g., 120), i.e., a channel that stays constant 
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Fig. 1. One-bit massive MIMO uplink system model. 


for T channel uses, and evolves independently across blocks of 
T channel uses. We shall refer to T as the channel coherence 
time (measured in channel uses). We further assume that the sub¬ 
channels are mutually independent. The discrete-time complex 
baseband received signal over all antennas within an arbitrary 
coherence block and before quantization, is modelled as 

Y = XH-fW. (1) 

Here, X G denotes the channel input, H G (£Kxn 

is the channel matrix connecting the K users to the N BS 
antennas. The entries of H are independent and CJV{0, 1) dis¬ 
tributed. Furthermore, the matrix W G whose entries 

are independent and CAf{Q, 1) distributed, stands for the AWGN. 

The real and the imaginary components of the received signal 
at each antenna are quantized separately using a one-bit ADC. 
Let 72. = {I -I j, — 1 -f j, — 1 — j, I — j} be the set of possible 
quantization outcomes. It will be convenient to describe the 
joint operation of all 2N one-bit ADCs at the BS through 
the function Q(-) : —> 72^^^ that maps the output 

matrix Y with entries {yt,n} into the quantized output matrix 
R with entries {n^n} according to = sign{3fi{?/t,„}} 

and 3{rt „} = sign{3{j/t „}}. Using this convention, we can 
write the quantized output matrix as 

g(Y) = Q(XH-f W). (2) 

We shall consider the scenario where neither the users nor the 
BS are aware of the realizations of the channel matrix FI (no a 
priori CSI), and where coding is performed over many coherence 
blocks. For this scenario, the channel sum-rate capacity C is 

C'(p) = sup/(X;R) (3) 

where the supremum is computed over all input probability 
distributions on X for which the columns of X are independent, 
and the following average-power constraint holds: 

E [tr{XX^}] < KTp. (4) 

Since the noise variance is normalized to one, we can think of p 
as the SNR. The sum-rate capacity 0 is, in general, not known 
in closed form, even in the infinite-precision case, for which 
tight bounds have been recently reported in . 



























III. SISOCase 
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We focus in this section on the scenario where there is only 
a single active user in the system and where the BS has a 
single receive antenna. For this case, the input-output relation (|^ 
reduces to 


r = Q(y) = Q(xh + w). (5) 


Here, x G and w G are the input and noise vectors, 
respectively, and h denotes the fading channel, which remains 
constant over the coherence block. For this case, the capacity ([^ 
is known m Th. 1] and given by 


[ —■Rqpsk(Pc), P<Pc 
C{p) = I Pc 

|.Rqpsk(p), P> Pc- 


( 6 ) 


Here, -Rqpsk denotes the rate achievable with QPSK, 


2 ^ ^ /T\ 

^qpsk(p) = 2 + log2/3(fc, T- k) (7) 


where 


—^h 


^(a,&) = Eg ^{-g^pT^ig^p) 


( 8 ) 


with g ~ ^(0,1) and <l>(a:) denoting the cumulative distribution 
function of a standard normal random variable. Furthermore, 
the SNR threshold pc in (|^ is the solution of the following 
optimization problem: 

Pc = arg max - i?QPSK (p) ■ (9) 

p>0 P 


A common approach to transmitting information over fading 
channels whose realizations are not known a priori to the receiver 
is to reserve a certain number of channel uses at the beginning of 
each coherence block for the transmission of pilot symbols j25j . 
These pilot symbols are then used at the receiver to estimate 
the fading channel. Assume that 0 < P < T pilots are 
used and let be the P-dimensional vector containing these 
pilot symbols. Similarly, let be the corresponding one-bit 
quantized channel output. The pair ) can be used at 

the receiver to estimate the channel h. As in Q, we shall focus 
on LS estimation because of its low complexity. When the pilots 
are QPSK symbols, the LS estimate h of his 


h = (x(P))^r(P). 

pVp 


( 10 ) 


Under the assumption that the T — P data symbols are drawn 
independently from the same input distribution, one obtains then 
the following lower bound on C: 


C{p) > ^ ^ I{x-,r\h) 


T 


= 2 - 


T-P 


T 


t=o 




p{i+i,p-t)\ 
p{i,p-i)) 


(11) 

( 12 ) 


Here, denotes the binary entropy function. The inequality 
in follows from standard manipulations on the mutual 



Fig. 2. Comparison of capacity and the pilot-based LS-estimation lower 
bound l |121 for the SISO case when p = 10 dB. 


information (see, e.g., p0|); the equality ( [T2l i is proven in 
Appendix ^ 

In Fig. ^ we plot the capacity (|^ and the pilot-based LS- 
estimation lower bound © for the case p = 10 dB. Note that, 
for this p value, C (p) = i?QpsK (p) for all T. The number of 
pilots in ( [T^ is optimized for each value of T. For reference, we 
also depict the perfect receiver-CSI capacity ©. As T grows 
large, the gap between (|^ and the perfect receiver-CSI capacity 
decreases. The gap between the pilot-based LS-estimation lower 
bound ( [T^ and capacity (|^ is essentially constant over the 
considered range of T values. One exception is the case T = 2, 
for which there seems to be no gap. Indeed, the following result 
holds (see | [2^ Lem. 1]). 

Lemma 1: The RHS of ( [T2| coincides with the rate acheivable 
with QPSK 0 for the case T = 2. 

It is well known that the pilot-based lower bound ( [T^ can be 
improved by using also the channel outputs corresponding to the 
data symbols to improve the channel estimate |18)-]20|. This 
approach is sometimes referred to as JPD processing. Perhaps 
surprisingly, for the one-bit quantization case, LS estimation 
combined with JPD processing turns out to be optimal, as 
formalized in the following theorem. 

Theorem!: For the channel © , LS estimation combined with 
JPD processing achieves the channel capacity Q. 

Proof: See Appendix [B] ■ 

This result implies that if one allows for JPD processing, 
LS channel estimation is optimal, and there is no need to use 
more sophisticated channel-estimation techniques such as the 
one recently proposed in p7| . 

IV. Massive MIMO Case 

Motivated by the results obtained for the SISO case, we now 
assess the rates achievable with LS estimation in a multiuser 
massive MIMO uplink scenario. To limit the receiver complexity, 
we shall only consider the pilot-based version of the LS estima¬ 
tion algorithm (no JPD processing). Indeed, JPD processing is in 
general computationally demanding pO) and may be not suitable 
for massive MIMO. We shall also assume that the receiver 
employs MRC to separate the information streams associated 
with the different users. 
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(a) TV = 40 antennas, p = 0 dB. 
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(b) N = 400 antennas, p = 0 dB. 
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(c) N = 400 antennas, p = 20 dB. (d) N = 400 antennas, p = 20 dB, 

fully correlated channel coefficients. 

Fig. 3. Single-user MRC outputs with LS channel estimation as a function of 
the number of receive antennas and the SNR; 16-QAM inputs. 


We assume that the users coordinate the pilot-transmission 
phase. Specihcally, they transmit their pilot sequences in a round- 
robin fashion. The channel estimates are then used to obtain 
the MRC coefficients. Differently from |j^, where a similar 
setup is considered, we focus on high-order modulations and 
not only on QPSK. Indeed, although QPSK is optimal in the 
SISO case, the use of multiple antennas at the receiver opens 
up the possibility to use high-order modulation formats. This is 
demonstrated in Fig.j^where we plot the MRC receiver output 
corresponding to 16-QAM data symbols for the case when a 
single user, alone in the cell, transmits also P = 20 pilots to let 
the BS acquire LS channel estimates. As the size of the receiver 
antenna array increases, the 16-QAM constellation becomes 
progressively distinguishable (see Fig. [Tb| l, provided that p is 
not too high. 

Additive noise is one of the factors that enables the detection 
of the 16-QAM constellation; the other is the independent phase 
of the fading coefficients associated with each receive antenna. 
Recall in fact that, due to the one-bit quantizer, the quantized 
output at each receive antenna belongs to the set TZ of cardinality 
4. These 4 possible channel outputs are then averaged by the 
MRC hlter to produce a channel output (a scalar) that belongs 
to an alphabet with much higher cardinality. The cardinality 
depends on the number of pilots and receive antennas. The key 
observation is that the inner points of the 16-QAM constellation 
are more likely to be erroneously detected. This results in a 
smaller averaged value after MRC than for the outer constellation 
points. 

To highlight the importance of the additive noise, we consider 



Fig. 4. Per-user achievable rate with LS estimation and MRC as a function of 
p;T = 1000, N = 400; the number of pilots P is optimized for each value of 
P- 


in Fig. 3c the case p = 20 dB. Since the additive noise is neg¬ 
ligible, all 16-QAM constellation points are detected correctly 
with high probability. As a result, the output of the MRC hlter 
lies approximately on a circle, which suggests that the amplitude 
of the transmitted signal cannot be used to convey information. 
When the noise is negligible and all fading coefficients are fully 
correlated, the constellation collapses to a noisy QPSK diagram 
(Fig. [3d| ). In this case, high-order modulations are not supported 
by the channel. 

The achievable rate Ri^lr for user k = I,... ,K with LS 


estimation and MRC is 
f^MRc(P) = 


•■MRC 

T-P 




H). 


(13) 


The mutual information between the channel input and the 
MRC receiver output can be computed by mapping to 
points over a regular grid in the complex plane as described in 1^. 
With this technique, one obtains a lower bound on (p) |28 
p. 3503] that becomes increasingly tight as the grid spacing is 
driven to zero. The conditional probability mass functions needed 
for the evaluation of the mutual information are computed using 
Monte-Carlo techniques|^ Since all the users in the system are 
assumed to be statistically equivalent, we have that 


C{p) > KR^^cip). (14) 


In Fig. 1^ we compare the rates achievable with QPSK and 
16-QAM as a function of p. The number of receive antennas is 
N = 400, the coherence time is T = 1000, and we consider 
both the case when the number of users iC is 1 and 20. The 
number of transmitted pilots is optimized for every p value. We 
see that 16-QAM outperforms QPSK already at SNR values 
as low as —15 dB. Furthermore, the full 16-QAM rate of 4 
bits per channel use can be achieved in the single-user case 
for large SNR values. Note that if p is further increased, the 16- 
QAM rate starts decreasing, because the constellation collapses 
to a circle (cf. Fig. [^. Note also that, when QPSK is used. 


* The numerical routines that are used to evaluate (131 can be downloaded at 
https;//github.com/infotheorychalmers/one-bit_massive_MIMO 
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Fig. 5. Per-user achievable rate with LS estimation and MRC per user as a 
function of T; = 400, p = —10 dB, K = 20; the number of pilots P is 
optimized for each value of T. 



Fig. 6. Per-user achievable rates as a function of the number of antennas; 
T = 1000, p = —10 dB; the number of pilots P is optimized for each value 
of N. 


the difference in achievable rate between the case K — 1 and 
if = 20 is marginal—an observation already reported in 0- 
On the contrary, the difference in achievable rates between 
single- and multi-user case is more pronounced when 16-QAM 
is used. This suggests that, with 16-QAM, the system becomes 
interference limited, and that the one-bit quantizers partly destroy 
the orthogonality between the fading channels associated with 
different users. 

In Fig. 1^ we plot the per-user achievable rates as a function 
of T for p = —10 dB, N = 400, and K = 20. The number 
of pilot symbols is again optimized for each value of T. We 
also depict the achievable rates for the perfect receiver-CSI 
case. Similarly to the SISO case, as T increases, the per-user 
achievable rates approach the perfect-receive CSI rate. However, 
this convergence occurs at a much slower pace than for the 
infinite-precision case (cf. p9| , p4)). This suggests that the 
one-bit ADC architecture may be unsuitable for high-mobility 
scenarios. Note also that the achievable rate is zero when T < 20. 
In fact, when orthogonal pilot sequences are transmitted, at least 
20 pilot symbols are required when K = 20. 

Finally, in Fig. we plot the per-user achievable rates as 


a function of the number of antennas. Here, p = —10 dB, 
and T = 1000. As in the previous cases, the number of pilot 
symbols is optimized for each value of N. We note that 16-QAM 
outperforms QPSK also when the number of receive antennas is 
much smaller than 400. We note also that, when QPSK is used, 
the achievable rate saturate rapidly as the number of receive 
antennas is increased. 


V. Conclusions 

We have analyzed the performance of a one-bit quantized 
receiver architecture operating over a Rayleigh block-fading 
channel whose realizations are not known a priori to transmitter 
and receiver. We have demonstrated that, for the SISO case, a 
signaling scheme based on LS estimation and JPD processing 
is capacity achieving. For the one-bit massive MIMO case, 
we have shown that, in contrast to the SISO case, high-order 
constellations can be used to convey information at higher 
rates than with QPSK. This holds in spite of the nonlinearity 
introduced by the one-bit quantizer and in spite of the multiuser 
interference. Similar results hold for the case when zero-forcing 
instead of MRC is used (see |[2^). Note also that constellations 
that are optimized for the nonlinearity introduced by the one-bit 
quantizers may yield higher achievable rates than the 16-QAM 
constellation analyzed in this paper. Extension of our analysis 
to the case when the ADCs have 2 or 3 bits of resolution is also 
of interest. 


Appendix A 
Proof of (12) 

Both the pilot and the data symbols are assumed to belong 
to a QPSK constellation. By symmetry, the rates achievable on 
the SISO channel (|^ with QPSK inputs are twice as high as 
the rates achievable with BPSK. Hence, in the remainder of the 
proof, we shall consider a real-valued version of the channel 
input-output relation (|^, where h and w are real Gaussian, and 
the input vector x consists of BPSK symbols. Let I denote the 
number of sign mismatches between the BPSK vectors and 
the quantized received vector . Since for the real case there 
exists a one-to-one relation between the LS estimate h in ( [TOl l 
and i, we conclude that I{x] r\h) = I{x\r\ i). To evaluate this 
mutual information, we need the conditional probability mass 
function Pr\x,h which can be expressed as follows 




Eh[pe\h{( \ h)^{rhx)] 

PeiP 


(15) 


Using a similar approach as the one detailed in |14| , one can 
show that 


Pe\h{ph)= (16) 


and 

Pi{e) = (^^y{e,p-e). (i?) 

Substituting ( [Tb] ) and ( |T7| ) into ( [T5] l, and then using the definition 
of mutual information, and that 


p{i + i,p-i) + p{e, p-e + i) = /3(£, p-i) (is) 
















one obtains ( [T2| ) (see App. B] for details). 


Appendix B 
Proof of Theorem 2 

Our JPD processing lower bound is based on the following 
scheme. The first transmit symbol in each coherence block is a 
pilot symbol. To decode the nth symbol in the block, we rely on 
the LS channel estimate obtained on the basis of the past n — 1 
symbols (1 pilot symbol and n — 2 data symbols). This scheme 
yields the following achievable rate; 

1 

RwDiP) = (19) 

n =2 


Here, we have indicated the channel estimate by h{n — 1) to 
clarify the number of input symbols that are used to estimate 
the channel. As in Appendix]^ it is sufficient to focus on a real¬ 
valued version of the channel input-output relation (|^, where 
h and w are real Gaussian, and the input vector x consists of 
BPSK symbols. 

To establish Theorem]^ it is then sufficient to show that ( [T9| ) 
coincides with 


^BPSKip) = ^ + fc)log2^(fc,T- k). (20) 

The final result is then established by replacing BPSK with on- 
off BPSK. See for the details. Our proof that ( [T^ coincides 
with ( |20l l is by induction. We start by noting that when T — 
2, ( [T^ coincides with the RHS of ([T^. Equality between ( [T^ 
and ( |20| l for this case then follows from Lemma [T] 

We now assume that JPD processing achieves ( |20| for a given 
coherence time T. We need to prove that the same holds when 
the coherence time is T -f 1. Note that 


R%i%) = ^[TRgUp)+li^T+i;rT+i\hiT)) 


.( 21 ) 


('T\ 

By the induction hypothesis, we can replace i?jpQ(p) by 
Furthermore, we can replace the mutual information 
on the RHS of @ with the RHS of evaluated for P = T. 
The desired result then follows by using ( [T8] l, the following 
binomial equality 



and by performing simple algebraic manipulations. 
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